Bytecode and the Compiler
Two Functions That Look Different but Produce the Same Bytecode
Start with this puzzle:
import dis
def version_a(x):
"""Explicit multiplication."""
return x * 2
def version_b(x):
"""Left shift - bit trick for multiply by 2."""
return x << 1
def version_c(x):
"""Using a constant expression."""
TWO = 1 + 1
return x * TWO
Intuition says version_b (bit shift) might be faster than version_a (multiply), and that version_c adds an extra variable. Now look at the bytecode:
print("=== version_a ===")
dis.dis(version_a)
print("\n=== version_b ===")
dis.dis(version_b)
print("\n=== version_c ===")
dis.dis(version_c)
=== version_a ===
3 RESUME 0
4 LOAD_FAST 0 (x)
LOAD_CONST 1 (2)
BINARY_OP 5 (*)
RETURN_VALUE
=== version_b ===
7 RESUME 0
8 LOAD_FAST 0 (x)
LOAD_CONST 1 (1)
BINARY_OP 7 (<<)
RETURN_VALUE
=== version_c ===
11 RESUME 0
12 LOAD_CONST 1 (2) ← TWO = 1+1 folded to 2
STORE_FAST 1 (TWO)
13 LOAD_FAST 0 (x)
LOAD_FAST 1 (TWO)
BINARY_OP 5 (*)
RETURN_VALUE
version_a and version_b are structurally identical - same opcode count, same stack depth. version_c has an extra STORE_FAST/LOAD_FAST pair (slower than the others), but the 1 + 1 expression was folded to 2 at compile time - you never pay for the addition at runtime.
This is the bytecode compiler in action. Understanding it lets you predict what "optimisations" actually help and which are illusions.
The Compilation Pipeline
Python source goes through six distinct stages before becoming executable bytecode:
Source text: "def f(x): return x + 1"
│
▼
┌────────────────┐
│ Tokeniser │ Parser/tokenize.c
│ │ Text → token stream
└────────────────┘ NAME('def') NAME('f') OP('(') NAME('x') OP(')')
│ OP(':') NAME('return') NAME('x') OP('+') NUMBER('1')
▼
┌────────────────┐
│ PEG Parser │ Parser/parser.c
│ (since 3.9) │ Tokens → Concrete Syntax Tree
└────────────────┘ Validates grammar: SyntaxError raised here
│
▼
┌────────────────┐
│ AST Builder │ Python/ast.c
│ │ CST → Abstract Syntax Tree (drops syntactic details)
└────────────────┘ FunctionDef(name='f', args=[arg(arg='x')],
│ body=[Return(value=BinOp(left=Name('x'), op=Add(),
│ right=Constant(value=1)))])
▼
┌────────────────┐
│ Symbol Table │ Python/symtable.c
│ │ Analyses scopes: which names are local/global/free?
└────────────────┘ 'x' → LOCAL (assigned as argument), 'f' → GLOBAL
│
▼
┌────────────────┐
│ Compiler │ Python/compile.c
│ │ AST + symbol table → bytecode
└────────────────┘ Emits: RESUME 0, LOAD_FAST 0, LOAD_CONST 1, BINARY_OP 0,
│ RETURN_VALUE
▼
┌────────────────┐
│ PyCodeObject │ Immutable, can be serialised to .pyc
│ │ co_code: b'\x97\x00...' (raw bytes)
└────────────────┘ co_consts: (None, 1)
co_varnames: ('x',)
co_names: ()
Working with the AST
The ast module exposes the compiler's internal representation:
import ast
source = """
def greet(name, times=1):
for _ in range(times):
print(f"Hello, {name}!")
return None
"""
# Parse to AST
tree = ast.parse(source)
print(ast.dump(tree, indent=2))
Output (abbreviated):
Module(
body=[
FunctionDef(
name='greet',
args=arguments(
args=[arg(arg='name'), arg(arg='times')],
defaults=[Constant(value=1)]),
body=[
For(
target=Name(id='_', ctx=Store()),
iter=Call(func=Name(id='range'), args=[Name(id='times')]),
body=[
Expr(value=Call(
func=Name(id='print'),
args=[JoinedStr(...)]))]),
Return(value=Constant(value=None))],
returns=None)],
type_ignores=[])
Walking the AST to collect all function names:
import ast
class FunctionCollector(ast.NodeVisitor):
def __init__(self):
self.functions = []
def visit_FunctionDef(self, node):
self.functions.append(node.name)
self.generic_visit(node) # Continue walking into nested functions
def visit_AsyncFunctionDef(self, node):
self.functions.append(f"async {node.name}")
self.generic_visit(node)
source = """
def outer():
async def inner():
pass
def helper():
pass
"""
tree = ast.parse(source)
collector = FunctionCollector()
collector.visit(tree)
print(collector.functions) # ['outer', 'async inner', 'helper']
Modifying the AST before compilation (a technique used by testing frameworks):
import ast
import types
class AssertRewriter(ast.NodeTransformer):
"""Rewrite 'assert expr' to include the expression's value in the message."""
def visit_Assert(self, node):
# Transform: assert x == y
# Into: assert x == y, f"{x!r} != {y!r}"
if node.msg is None:
# Build a format string from the assertion expression
expr_src = ast.unparse(node.test)
new_msg = ast.Constant(value=f"Assertion failed: {expr_src}")
node.msg = new_msg
return node
source = """
x = 5
y = 6
assert x == y
"""
tree = ast.parse(source)
rewriter = AssertRewriter()
new_tree = rewriter.visit(tree)
ast.fix_missing_locations(new_tree)
# Compile and run the modified AST
code = compile(new_tree, '<string>', 'exec')
try:
exec(code)
except AssertionError as e:
print(f"AssertionError: {e}")
# AssertionError: Assertion failed: x == y
The dis Module: Complete Reference
import dis
def example(items, threshold=10):
result = []
for item in items:
if item > threshold:
result.append(item)
return result
# Basic disassembly
dis.dis(example)
# Structured access to bytecode
print("\n--- Bytecode objects ---")
bc = dis.Bytecode(example)
for instr in bc:
print(f" offset={instr.offset:3d} "
f"opname={instr.opname:<20s} "
f"arg={str(instr.arg):<5s} "
f"argval={instr.argval!r}")
# Get all instructions as a list
instructions = list(dis.get_instructions(example))
print(f"\nTotal instructions: {len(instructions)}")
# Code object details
code = example.__code__
print(f"\nco_varnames: {code.co_varnames}")
print(f"co_consts: {code.co_consts}")
print(f"co_names: {code.co_names}")
print(f"co_argcount: {code.co_argcount}")
print(f"co_flags: {code.co_flags:#010x}")
print(f"co_stacksize: {code.co_stacksize}")
# Disassemble a string directly
dis.dis("x = a + b * c")
Important Opcodes Explained
| Opcode | What it does | When generated |
|---|---|---|
LOAD_FAST | Push localsplus[i] onto stack | Local variable read |
STORE_FAST | Pop stack top, store at localsplus[i] | Local variable write |
LOAD_GLOBAL | Dict lookup in globals then builtins | Global name read |
STORE_GLOBAL | Store into globals dict | Global variable write |
LOAD_CONST | Push co_consts[i] | Literal (number, string, None) |
LOAD_ATTR | Pop object, push getattr(obj, name) | obj.name access |
STORE_ATTR | Pop value and obj, call setattr | obj.name = value |
BINARY_OP | Pop two items, apply operator, push result | a + b, a * b, etc. |
CALL | Call a callable with N args from stack | Any function call |
RETURN_VALUE | Pop top of stack, return to caller | return expr |
FOR_ITER | Call __next__() on TOS; jump if StopIteration | Inside a for loop |
GET_ITER | Call iter() on TOS; push iterator | for x in iterable: |
BUILD_LIST | Pop N items, construct list, push | [a, b, c] literal |
BUILD_MAP | Pop N key-value pairs, build dict | {k: v, ...} literal |
JUMP_FORWARD | Unconditional forward jump | End of if branch |
POP_JUMP_IF_FALSE | Pop TOS; jump if it is falsy | if condition: |
PUSH_EXC_INFO | Push exception info for handler | except clause |
MAKE_FUNCTION | Create function object from code object | def f(): |
LOAD_CLOSURE | Load a cell variable for a closure | Closure creation |
COPY_FREE_VARS | Copy free vars from closure into frame | Closure execution |
.pyc Files: Bytecode Caching
When CPython imports a module, it caches the compiled bytecode in a .pyc file to avoid recompiling on the next run:
mymodule.py (source - human-readable)
__pycache__/
mymodule.cpython-312.pyc (compiled bytecode)
The .pyc file format:
Byte offset Size Content
──────────────────────────────────────────────────────────
0 4 Magic number: encodes Python version + compiler flags
e.g., 0x0D0D0A6F for CPython 3.12
4 4 Bit field: 0 = timestamp-based, 1 = hash-based
8 4 Source timestamp (if timestamp-based) OR source hash
12 4 Source file size in bytes (if timestamp-based)
16 ... marshal-serialised PyCodeObject
The magic number changes with every CPython release that changes bytecode semantics. If the magic number in a .pyc does not match the running interpreter, the .pyc is ignored and the source is recompiled.
import importlib.util
import struct
import marshal
import time
def read_pyc(path):
"""Read and decode a .pyc file."""
with open(path, 'rb') as f:
magic = f.read(4)
bit_field = struct.unpack('<I', f.read(4))[0]
if bit_field & 1: # Hash-based .pyc
source_hash = f.read(8)
print(f"Hash-based .pyc, source hash: {source_hash.hex()}")
else: # Timestamp-based .pyc
timestamp = struct.unpack('<I', f.read(4))[0]
size = struct.unpack('<I', f.read(4))[0]
print(f"Timestamp: {time.ctime(timestamp)}, size: {size}")
print(f"Magic: {magic.hex()}")
code = marshal.load(f)
print(f"Code object: {code}")
print(f"co_filename: {code.co_filename}")
return code
# Find a .pyc in your __pycache__
import os
import sys
import json
# Force compilation of json module
importlib.util.find_spec('json')
import json
json_source = json.__file__
json_pyc = json_source.replace('.py', '') + \
f'.cpython-{sys.version_info.major}{sys.version_info.minor}.pyc'
json_pyc = json_source.replace('json/__init__.py',
f'json/__pycache__/__init__.cpython-{sys.version_info.major}{sys.version_info.minor}.pyc')
if os.path.exists(json_pyc):
code = read_pyc(json_pyc)
print(f"co_names[:5]: {code.co_names[:5]}")
Code Objects: The Compiled Artefact
A PyCodeObject is the result of compiling a function, class, or module. It is immutable, hashable, and can be serialised with marshal. Every function object holds a reference to a code object.
import dis
def outer():
x = 10
def inner(y):
return x + y # 'x' is a free variable (captured from outer)
return inner
code = outer.__code__
print("=== outer ===")
print(f"co_varnames: {code.co_varnames}") # ('x', 'inner') - locals
print(f"co_cellvars: {code.co_cellvars}") # ('x',) - x is captured by inner
print(f"co_freevars: {code.co_freevars}") # () - outer has no free vars
print(f"co_consts: {code.co_consts}") # (None, 10, <code object inner>)
print(f"co_nlocals: {code.co_nlocals}") # 2
inner_code = outer.__code__.co_consts[2] # The nested code object
print("\n=== inner ===")
print(f"co_varnames: {inner_code.co_varnames}") # ('y',)
print(f"co_freevars: {inner_code.co_freevars}") # ('x',) - captured from outer
print(f"co_cellvars: {inner_code.co_cellvars}") # ()
# co_flags encodes various function properties as a bitmask
print(f"\nouter co_flags: {code.co_flags:#010x}")
# Bit 0x04: *args Bit 0x08: **kwargs Bit 0x20: generator Bit 0x100: nested
print(f"inner co_flags: {inner_code.co_flags:#010x}")
# Should have the NESTED flag (0x10) set
Peephole Optimisation and Constant Folding
CPython's compiler applies constant folding and basic dead code elimination. In Python 3.12+, this is done in the AST optimiser pass (rather than a separate peephole pass):
import dis
# 1. Constant folding: arithmetic on literals
def folded_arithmetic():
return 60 * 60 * 24 * 365 # Should be precomputed to 31536000
dis.dis(folded_arithmetic)
# LOAD_CONST 0 (31536000) ← the entire expression is one constant
# RETURN_VALUE
# 2. String concatenation of literals
def folded_string():
return "hello" + " " + "world"
dis.dis(folded_string)
# LOAD_CONST 0 ('hello world') ← folded at compile time
# RETURN_VALUE
# 3. Tuple of constants (used in 'in' membership tests)
def folded_tuple():
x = 5
return x in (1, 2, 3, 4, 5) # Tuple of constants → LOAD_CONST
dis.dis(folded_tuple)
# LOAD_CONST 1 ((1, 2, 3, 4, 5)) ← tuple is a single constant
# CONTAINS_OP
# 4. Dead code after return
def dead_code():
return 42
x = 100 # Never executed
print(x) # Never executed
dis.dis(dead_code)
# LOAD_CONST 1 (42)
# RETURN_VALUE
# ← Dead code after RETURN_VALUE is eliminated
# 5. What does NOT get folded?
def not_folded(x):
return x * (60 * 60) # The literal 3600 is folded, but x * 3600 is not
dis.dis(not_folded)
# LOAD_FAST 0 (x)
# LOAD_CONST 1 (3600) ← 60*60 was folded to 3600
# BINARY_OP 5 (*) ← x * 3600 happens at runtime
# RETURN_VALUE
What CPython does NOT optimise at the bytecode level (that languages like Go or C compilers do):
- Loop-invariant code motion
- Inlining function calls
- Strength reduction (e.g., replacing integer division by powers of 2 with right shifts)
- Dead store elimination
- Alias analysis
These are the domain of PyPy's JIT compiler and tools like Cython or Numba, not CPython's ahead-of-time compiler.
Python 3.11+ Specialising Adaptive Interpreter
Python 3.11 introduced a fundamentally new optimisation strategy: specialisation. Instead of one generic opcode per operation, the interpreter observes what types flow through each opcode site and replaces the generic opcode with a specialised variant tuned for those types.
import dis
def add_integers(a, b):
return a + b
# First, look at the "cold" bytecode (before specialisation)
dis.dis(add_integers)
# LOAD_FAST 0 (a)
# LOAD_FAST 1 (b)
# BINARY_OP 0 (+) ← Generic opcode
# RETURN_VALUE
# Call the function many times to trigger specialisation
for _ in range(100):
add_integers(1, 2)
# On Python 3.12+, you can see specialised opcodes:
import opcode
print(opcode.opmap.get('BINARY_OP_ADD_INT', 'not available'))
# In 3.11+: BINARY_OP_ADD_INT is a specialised variant
How specialisation works:
Execution 1: BINARY_OP (+) executes with int + int
→ Increment specialisation counter
Execution 2-7: Same - counter increments
Execution 8 (specialisation threshold ~8):
→ Observe: both operands are always int
→ Replace BINARY_OP with BINARY_OP_ADD_INT
Execution 9+: BINARY_OP_ADD_INT executes:
→ Checks: is left an int? is right an int?
→ If yes: directly calls long_add() - skips type dispatch entirely
→ If no: "deoptimises" back to generic BINARY_OP
Specialised opcode families (Python 3.11-3.13):
| Generic Opcode | Specialised Variant | Condition |
|---|---|---|
BINARY_OP + | BINARY_OP_ADD_INT | Both operands are int |
BINARY_OP + | BINARY_OP_ADD_FLOAT | Both operands are float |
BINARY_OP + | BINARY_OP_ADD_UNICODE | Both operands are str |
LOAD_GLOBAL | LOAD_GLOBAL_MODULE | Found in module dict |
LOAD_GLOBAL | LOAD_GLOBAL_BUILTIN | Found in builtins dict |
LOAD_ATTR | LOAD_ATTR_SLOT | Attribute is a __slots__ member |
LOAD_ATTR | LOAD_ATTR_WITH_HINT | Instance __dict__ with cached index |
CALL | CALL_PY_EXACT_ARGS | Python function, exact arg count |
CALL | CALL_BUILTIN_FAST | C builtin with positional args |
The specialisation is adaptive - if the types change (a function that was always called with integers starts receiving floats), the specialised opcode "deoptimises" back to the generic form and the counter resets.
Modifying Bytecode at Runtime
Python's types.CodeType is the Python-accessible version of PyCodeObject. It is immutable, but you can create a modified copy and replace a function's __code__ attribute:
import dis
import types
def original(x):
"""Return x * 2."""
return x * 2
# Let's modify this function to return x * 3 instead
# by replacing the constant 2 with 3 in co_consts
original_code = original.__code__
print("Original co_consts:", original_code.co_consts) # (None, 2)
# Create a new code object with modified constants
# In Python 3.8+, use code.replace()
modified_code = original_code.replace(
co_consts=(None, 3) # Replace 2 with 3
)
# Replace the function's code object
original.__code__ = modified_code
# Verify
print(original(5)) # Should now print 15, not 10
dis.dis(original)
# LOAD_FAST 0 (x)
# LOAD_CONST 1 (3) ← Now 3 instead of 2
# BINARY_OP 5 (*)
# RETURN_VALUE
A more practical example - injecting a profiling wrapper at the bytecode level:
import dis
import types
import time
def add_timing(func):
"""Inject a timing wrapper by prepending/appending bytecode."""
# This is illustrative - real profiling uses sys.settrace or
# __code__ manipulation more carefully
# Simpler approach: wrap at Python level
import functools
@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print(f"{func.__name__} took {elapsed*1000:.3f}ms")
return result
return wrapper
# The "real" way: use sys.settrace for profiling (see cProfile)
import cProfile
import pstats
import io
def expensive_computation():
total = 0
for i in range(100_000):
total += i ** 2
return total
pr = cProfile.Profile()
pr.enable()
expensive_computation()
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(5)
print(s.getvalue())
compile() and exec(): Dynamic Code Execution
import ast
# compile() gives you a code object you can exec or inspect
code = compile("x = 1 + 2", "<string>", "exec")
print(type(code)) # <class 'code'>
dis.dis(code)
# exec in a custom namespace
namespace = {}
exec(code, namespace)
print(namespace['x']) # 3
# eval() for expressions
result = eval("2 ** 10 + 1")
print(result) # 1025
# Compile an AST directly - powerful for code generation
tree = ast.parse("result = [x * x for x in range(10)]")
# Modify the AST...
ast.fix_missing_locations(tree)
code = compile(tree, "<generated>", "exec")
ns = {}
exec(code, ns)
print(ns['result']) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# Different modes:
# 'exec' - module or function body (sequence of statements)
# 'eval' - single expression (must return a value)
# 'single' - single interactive statement (prints result like REPL)
# Security: exec with restricted namespaces does NOT provide real sandboxing
# Malicious code can escape any namespace restriction in Python
# Use subprocess/containers for untrusted code execution
Interview Q&A
Q1: What is a Python code object, and how does it differ from a function object?
A code object (PyCodeObject in C, code in Python) is the compiled representation of a function body - produced once at compile time (or import time) and shared across all calls to the function. It contains everything about the function's structure: co_code (bytecode as bytes), co_consts (a tuple of literal values), co_varnames (local variable names in index order), co_names (global/attribute names), co_freevars (captured variable names), co_argcount, co_stacksize, and source location information. A code object is immutable.
A function object (PyFunctionObject in C, function in Python) wraps a code object with the runtime context needed to call it: __globals__ (the module's global namespace dict), __defaults__ (default argument values), __closure__ (a tuple of cell objects for captured variables), __doc__, __name__, and __dict__ (for function attributes). A function object is created each time a def statement executes (not when the module is compiled). Multiple function objects can share the same code object - for example, a function defined inside a loop creates N function objects but only one code object.
Q2: What does the Python compiler optimise? What are its limitations compared to compiled languages?
CPython's compiler performs these optimisations: (1) Constant folding - 60 * 60 * 24 is evaluated at compile time to 86400, stored as a single LOAD_CONST; (2) String literal concatenation - adjacent string literals are folded; (3) Tuple constants - (1, 2, 3) in a membership test is a single constant; (4) Dead code elimination - code after an unconditional return is removed; (5) Boolean constant simplification - not True → False.
The limitations are significant: no loop-invariant code motion, no function inlining, no strength reduction (integer division by power-of-2 → right shift), no alias analysis, no register allocation (the eval loop is always a stack machine), and no type-based optimisation in the standard compiler. CPython has no JIT tier. The specialising adaptive interpreter (3.11+) does opcode specialisation based on observed types at runtime, but it operates at the opcode level, not the native code level. PyPy, Cython, and Numba fill the gap for compute-intensive code.
Q3: What is the specialising adaptive interpreter introduced in Python 3.11? How does it speed up code?
The specialising adaptive interpreter (PEP 659) is a mechanism where individual bytecode instruction sites adapt based on the types they observe at runtime. Each opcode site has a specialisation counter. When the counter reaches a threshold (~8 executions), the interpreter examines the types of the operands and replaces the generic opcode with a specialised variant that is tuned for those types.
For example, BINARY_OP (+) becomes BINARY_OP_ADD_INT when both operands are always integers. The specialised version skips the normal type dispatch (which traverses tp_as_number->nb_add through the type object), instead calling long_add() directly after a fast type check. For LOAD_GLOBAL, the specialised LOAD_GLOBAL_MODULE version caches the dict version number and offset, turning a hash table lookup into a direct array access on cache hits.
If the types change (deoptimisation trigger), the specialised opcode is replaced back with the generic version and the counter resets. This is safe at the cost of occasional deoptimisation overhead. The overall speedup is roughly 10-25% for typical Python code, contributing to the 25% overall speedup claimed for Python 3.11 vs 3.10.
Q4: What is a .pyc file and when is it invalidated?
A .pyc file is a bytecode cache - a serialised PyCodeObject stored on disk so that subsequent imports do not need to recompile the source. It lives in a __pycache__ directory next to the source file, named with the Python version: module.cpython-312.pyc.
The file starts with a 4-byte magic number that encodes the Python version and a compiler version counter. If the magic number does not match the running interpreter, the .pyc is rejected and the source is recompiled.
For source validation, CPython supports two modes: (1) Timestamp-based (default) - the .pyc stores the source file's mtime and size. On import, these are checked against the current mtime/size; if they differ, the source is recompiled. (2) Hash-based (opt-in via --invalidation-mode or the py_compile module) - the .pyc stores a hash of the source content. This is more reliable in environments where file timestamps are unreliable (e.g., CI/CD systems, version-controlled deployments). You can create a hash-based .pyc with py_compile.compile('file.py', invalidation_mode=py_compile.PycInvalidationMode.CHECKED_HASH).
Q5: How do you read and modify Python bytecode at runtime? What are the legitimate use cases?
A function's bytecode is accessible via func.__code__. The dis module provides human-readable disassembly. To modify, create a new code object with code.replace(co_consts=..., co_code=..., ...) (Python 3.8+) and assign it to func.__code__.
Legitimate use cases: (1) Testing frameworks - pytest rewrites assert statements by modifying the AST before compilation to include detailed failure messages; (2) Coverage tools - coverage.py instruments bytecode to track which lines execute; (3) Profilers - performance profilers can inject tracing code; (4) Debuggers - pdb uses sys.settrace which hooks into the eval loop's tracing mechanism; (5) Security scanning - static analysis of .pyc files without needing source; (6) Obfuscation - distributing .pyc files without source (limited protection, but used commercially).
Direct bytecode manipulation is fragile because bytecode format changes between Python versions. The AST-based approach (ast.NodeTransformer + compile()) is more stable. For profiling specifically, sys.setprofile and sys.settrace are the right hooks - they integrate with the eval loop's built-in tracing support without requiring bytecode modification.
